尽管机器学习和基于排名的系统在广泛用于敏感决策过程(例如,确定职位候选者,分配信用评分)时,他们对成果的意外偏见充满了疑虑,这使算法公平(例如,人口统计学公平)平等,机会平等)的目标。 “算法追索”提供了可行的恢复动作,通过修改属性来改变不良结果。我们介绍了排名级别的追索权公平的概念,并开发了一个“追索意识的排名”解决方案,该解决方案满足了排名的追索公平约束,同时最大程度地减少了建议的修改成本。我们的解决方案建议干预措施可以重新排序数据库记录的排名列表并减轻组级别的不公平性;具体而言,子组的不成比例表示和追索权成本不平衡。此重新排列可确定对数据点的最小修改,这些属性修改根据其易于解决方案进行了加权。然后,我们提出了一个有效的基于块的扩展,该扩展可以在任何粒度上重新排序(例如,银行贷款利率的多个括号,搜索引擎结果的多页)。对真实数据集的评估表明,尽管现有方法甚至可能加剧诉求不公平,但我们的解决方案 - raguel-可以显着改善追索性的公平性。 Raguel通过反事实生成和重新排列的结合过程优于改善追索性公平的替代方案,同时对大型数据集保持了有效的效率。
translated by 谷歌翻译
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译
It is crucial to protect the intellectual property rights of DNN models prior to their deployment. The DNN should perform two main tasks: its primary task and watermarking task. This paper proposes a lightweight, reliable, and secure DNN watermarking that attempts to establish strong ties between these two tasks. The samples triggering the watermarking task are generated using image Mixup either from training or testing samples. This means that there is an infinity of triggers not limited to the samples used to embed the watermark in the model at training. The extensive experiments on image classification models for different datasets as well as exposing them to a variety of attacks, show that the proposed watermarking provides protection with an adequate level of security and robustness.
translated by 谷歌翻译
Likelihood-based deep generative models have recently been shown to exhibit pathological behaviour under the manifold hypothesis as a consequence of using high-dimensional densities to model data with low-dimensional structure. In this paper we propose two methodologies aimed at addressing this problem. Both are based on adding Gaussian noise to the data to remove the dimensionality mismatch during training, and both provide a denoising mechanism whose goal is to sample from the model as though no noise had been added to the data. Our first approach is based on Tweedie's formula, and the second on models which take the variance of added noise as a conditional input. We show that surprisingly, while well motivated, these approaches only sporadically improve performance over not adding noise, and that other methods of addressing the dimensionality mismatch are more empirically adequate.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
在部署之前,保护DNN模型的知识产权是至关重要的。到目前为止,提出的方法要么需要更改内部模型参数或机器学习管道,要么无法满足安全性和鲁棒性要求。本文提出了一种轻巧,健壮且安全的黑盒DNN水印协议,该协议利用了加密单向功能以及在训练过程中注入任务钥匙标签 - 标签对。这些对后来用于在测试过程中证明DNN模型所有权。主要功能是证明及其安全性的价值是可衡量的。广泛的实验为各种数据集的图像分类模型以及将它们暴露于各种攻击中,表明它提供了保护的同时,同时保持了足够的安全性和鲁棒性。
translated by 谷歌翻译
translated by 谷歌翻译
自动化机器学习(AUTOML)框架已成为数据科学家武器库中的重要工具,因为它们大大减少了专门用于ML管道构建的手动工作。此类框架在数百万个可能的ML管道中智能搜索 - 通常包含功能工程,模型选择和超级参数调整步骤 - 并最终以预测精度输出最佳管道。但是,当数据集很大时,每个单独的配置都需要更长的时间才能执行,因此总体自动运行时间越来越高。为此,我们提出基质,这是一种可以解决数据大小而不是配置空间的汽车优化策略。它包装了现有的AutoML工具,而不是直接在整个数据集上执行它们,而是使用基于遗传的算法来找到一个小而代表性的数据子集,该算法保留了完整数据的特定特征。然后,它在小子集中使用了Automl工具,最后,它通过在大型数据集中执行限制的,更短的自动进程来完善所得管道。我们的实验结果在两个流行的Automl框架上进行的Auto-Sklearn和TPOT表明,基质将其运行时间降低了79%(平均为),而所得ML管道的准确性平均损失少于2%。
translated by 谷歌翻译